AITopics

Country:

Asia (0.93)
North America > United States (0.68)
Europe (0.68)
North America > Canada (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Neural Information Processing SystemsDec-24-2025, 08:57:42 GMT

Mining Multi-Label Samples from Single Positive Labels

Conditional generative adversarial networks (cGANs) have shown superior results in class-conditional generation tasks. To simultaneously control multiple conditions, cGANs require multi-label training datasets, where multiple labels can be assigned to each data instance. Nevertheless, the tremendous annotation cost limits the accessibility of multi-label datasets in real-world scenarios. Therefore, in this study we explore the practical setting called the single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels. To generate multi-label data in the single positive setting, we propose a novel sampling approach called single-to-multi-label (S2M) sampling, based on the Markov chain Monte Carlo method. As a widely applicable "add-on" method, our proposed S2M sampling method enables existing unconditional and conditional GANs to draw high-quality multi-label data with a minimal annotation cost. Extensive experiments on real image datasets verify the effectiveness and correctness of our method, even when compared to a model trained with fully annotated datasets.

mining multi-label sample, name change, single positive label, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hryniewska-Guzik, Weronika, Biecek, Przemyslaw

X-ray transferable polyrepresentation learning

arXiv.org Artificial IntelligenceJul-10-2025

The success of machine learning algorithms is inherently related to the extraction of meaningful features, as they play a pivotal role in the performance of these algorithms. Central to this challenge is the quality of data representation. However, the ability to generalize and extract these features effectively from unseen datasets is also crucial. In light of this, we introduce a novel concept: the polyrepresentation. Polyrepresentation integrates multiple representations of the same modality extracted from distinct sources, for example, vector embeddings from the Siamese Network, self-supervised models, and interpretable radiomic features. This approach yields better performance metrics compared to relying on a single representation. Additionally, in the context of X-ray images, we demonstrate the transferability of the created polyrepresentation to a smaller dataset, underscoring its potential as a pragmatic and resource-efficient approach in various image-related solutions. It is worth noting that the concept of polyprepresentation on the example of medical data can also be applied to other domains, showcasing its versatility and broad potential impact.

artificial intelligence, machine learning, representation, (18 more...)

2507.06264

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceJul-4-2025

Positive region preserved random sampling: an efficient feature selection method for massive data

Bai, Hexiang, Li, Deyu, Liang, Jiye, Zhai, Yanhui

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve the discriminatory ability of all the features of the target massive data set within an acceptable time on a personal computer. Second, the lower boundary of the probability of the object pairs that can be discerned using the feature subset selected in all object pairs that should be distinguished can be estimated before finding reducts. Furthermore, 11 data sets of different sizes were used to validate the proposed method. The results show that approximate reducts can be found in a very short period of time, and the discriminatory ability of the final reduct is larger than the estimated lower boundary. Experiments on four large-scale data sets also showed that an approximate reduct with high discriminatory ability can be obtained in reasonable time on a personal computer.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2507.01998

Country:

Asia > China > Shanxi Province (0.14)
Europe > Poland > Masovia Province > Warsaw (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Neural Information Processing SystemsMay-27-2025, 06:57:26 GMT

Predicting Label Distribution from Ternary Labels

Label distribution learning is a powerful learning paradigm to deal with label polysemy and has been widely applied in many practical tasks. A significant obstacle to the effective utilization of label distribution is the substantial expenses of accurate quantifying the label distributions. To tackle this challenge, label enhancement methods automatically infer label distributions from more easily accessible multi-label data based on binary annotations. However, the binary annotation of multi-label data requires experts to accurately assess whether each label can describe the instance, which may diminish the annotating efficiency and heighten the risk of erroneous annotation since the relationship between the label and the instance is unclear in many practical scenarios. Therefore, we propose to predict label distribution from ternary labels, allowing experts to annotate labels in a three-way annotation scheme.

binary annotation, label distribution, ternary label, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

arXiv.org Machine LearningDec-21-2024

Batch Selection for Multi-Label Classification Guided by Uncertainty and Dynamic Label Correlations

Zhou, Ao, Liu, Bin, Wang, Jin, Tsoumakas, Grigorios

The accuracy of deep neural networks is significantly influenced by the effectiveness of mini-batch construction during training. In single-label scenarios, such as binary and multi-class classification tasks, it has been demonstrated that batch selection algorithms preferring samples with higher uncertainty achieve better performance than difficulty-based methods. Although there are two batch selection methods tailored for multi-label data, none of them leverage important uncertainty information. Adapting the concept of uncertainty to multi-label data is not a trivial task, since there are two issues that should be tackled. First, traditional variance or entropy-based uncertainty measures ignore fluctuations of predictions within sliding windows and the importance of the current model state. Second, existing multi-label methods do not explicitly exploit the label correlations, particularly the uncertainty-based label correlations that evolve during the training process. In this paper, we propose an uncertainty-based multi-label batch selection algorithm. It assesses uncertainty for each label by considering differences between successive predictions and the confidence of current outputs, and further leverages dynamic uncertainty-based label correlations to emphasize instances whose uncertainty is synergistically expressed across multiple labels. Empirical studies demonstrate the effectiveness of our method in improving the performance and accelerating the convergence of various multi-label deep learning models.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Machine Learning

2412.16521

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Neural Information Processing SystemsOct-11-2024, 09:37:19 GMT

Mining Multi-Label Samples from Single Positive Labels

dataset, mining multi-label sample, single positive label, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-25-2024

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Huang, Yanyong, Yang, Li, Wang, Dongjie, Li, Ke, Yi, Xiuwen, Lv, Fengmao, Li, Tianrui

Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.

correlation, feature selection, selection, (16 more...)

2406.12193

Country:

North America > United States > Kansas > Douglas County > Lawrence (0.14)
Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Neural Information Processing SystemsMar-14-2024, 22:42:41 GMT

Maximum Margin Multi-Instance Learning Hua Wang

Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags.

classification, distance metric, multi-instance data, (16 more...)

Country: North America > United States > Texas (0.05)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMar-1-2024

Embedded Multi-label Feature Selection via Orthogonal Regression

Xu, Xueyuan, Wei, Fulin, Jia, Tianyuan, Zhuo, Li, Nie, Feiping, Wu, Xia

In the last decade, embedded multi-label feature selection methods, incorporating the search for feature subsets into model optimization, have attracted considerable attention in accurately evaluating the importance of features in multi-label classification tasks. Nevertheless, the state-of-the-art embedded multi-label feature selection algorithms based on least square regression usually cannot preserve sufficient discriminative information in multi-label data. To tackle the aforementioned challenge, a novel embedded multi-label feature selection method, termed global redundancy and relevance optimization in orthogonal regression (GRROOR), is proposed to facilitate the multi-label feature selection. The method employs orthogonal regression with feature weighting to retain sufficient statistical and structural information related to local label correlations of the multi-label data in the feature learning process. Additionally, both global feature redundancy and global label relevancy information have been considered in the orthogonal regression model, which could contribute to the search for discriminative and non-redundant feature subsets in the multi-label data. The cost function of GRROOR is an unbalanced orthogonal Procrustes problem on the Stiefel manifold. A simple yet effective scheme is utilized to obtain an optimal solution. Extensive experimental results on ten multi-label data sets demonstrate the effectiveness of GRROOR.

feature selection, multi-label feature selection method, selection, (10 more...)

2403.00307

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)